152 research outputs found
Mixing representation levels: The hybrid approach to automatic text generation
Natural language generation systems (NLG) map non-linguistic representations
into strings of words through a number of steps using intermediate
representations of various levels of abstraction. Template based systems, by
contrast, tend to use only one representation level, i.e. fixed strings, which
are combined, possibly in a sophisticated way, to generate the final text.
In some circumstances, it may be profitable to combine NLG and template based
techniques. The issue of combining generation techniques can be seen in more
abstract terms as the issue of mixing levels of representation of different
degrees of linguistic abstraction. This paper aims at defining a reference
architecture for systems using mixed representations. We argue that mixed
representations can be used without abandoning a linguistically grounded
approach to language generation.Comment: 6 page
Exploiting Lexical Resources for Therapeutic Purposes: the Case of WordNet and STaRS.sys
In this paper, we present an on-going project aiming at extending the Word-Net lexical database by encoding common sense featural knowledge elicited from language speakers. Such extension of WordNet is required in the framework of the STaRS.sys project, which has the goal of building tools for supporting the speech therapist during the preparation of exercises to be submitted to aphasic patients for rehabilitation purposes. We review some preliminary results and illustrate what extensions of the existing WordNet model are needed to accommodate for the encoding of commonsense (featural) knowledge
A Feature Type Classification for Therapeutic Purposes: A Preliminary Evaluation with Non-Expert Speakers
We propose a feature type classification thought to be used in a therapeutic context. Such a scenario lays behind our need for a easily usable and cognitively plausible classification. Nevertheless, our proposal has both a practical and a theoretical out-come, and its applications range from com-putational linguistics to psycholinguistics. An evaluation through inter-coder agree-ment has been performed to highlight the strength of our proposal and to conceive some improvements for the future
Encoding Commonsense Lexical Knowledge into WordNet
In this paper, we propose an extension of the WordNet conceptual model, with the final purpose of encoding the common sense lexical knowledge associated to words used in everyday life. The extended model has been defined starting from the short descriptions generated by naïve speakers in relation to tar-get concepts (i.e. feature norms). Even if this proposal has been developed primarily for therapeutic purposes, it can be seen as a generalization of the original WordNet model that takes into account a much wider and systematic set of semantic relations. The extended model is also an enhancement of the psycholinguistic vocation of the WordNet model. A featural representation of concepts is nowadays assumed by most models of the human semantic memory. For testing our proposal, we conducted a fea-ture elicitation experiment and collected de-scriptions of 50 concepts from 60 participants. Problematic issues related to the encoding of this information into WordNet are discussed and preliminary results are presented
Evaluating cross-language annotation transfer in the MultiSemCor corpus
In this paper we illustrate and evaluate an approach to the creation of high quality linguistically annotated resources based on the exploitation of aligned parallel corpora. This approach is based on the assumption that if a text in one language has been annotated and its translation has not, annotations can be transferred from the source text to the target using word alignment as a bridge. The transfer approach has been tested in the creation of the MultiSemCor corpus, an English/Italian parallel corpus created on the basis of the English SemCor corpus. In MultiSemCor texts are aligned at the word level and semantically annotated with a shared inventory of senses. We present some experiments carried out to evaluate the different steps involved in the methodology. The results of the evaluation suggest that the cross-language annotation transfer methodology is a promising solution allowing for the exploitation of existing (mostly English) annotated resources to bootstrap the creation of annotated corpora in new (resource-poor) languages with greatly reduced human effort.
English/Veneto Resource Poor Machine Translation with STILVEN
The paper reports ongoing work for the
implementation of a system for automatic translation
from English-to-Veneto and viceversa. The system does
not have parallel texts to work on because of the
almost inexistence of such manual translations. The
project is called STILVEN and is financed by the
Regional Authorities of Veneto Region in Italy. After
the first year of activities, we managed to produce a
prototype which handles Venetian questions that have
a structure very close to English. We will present
problems related to Veneto, basic ideas, their
implementatiion and results obtained
VenPro: A Morphological Analyzer for Venetan
This document reports the process of extending MorphoPro for Venetan, a lesser-used language spoken in the Nort-Eastern part of Italy.
MorphoPro is the morphological component of TextPro, a suite of tools oriented towards a number of NLP tasks. In order to extend this
component to Venetan, we developed a declarative representation of the morphological knowledge necessary to analyze and synthesize
Venetan words. This task was challenging for several reasons, which are common to a number of lesser-used languages: although
Venetan is widely used as an oral language in everyday life, its written usage is very limited; efforts for defining a standard orthography
and grammar are very recent and not well established; despite recent attempts to propose a unified orthography, no Venetan standard is
widely used. Besides, there are different geographical varieties and it is strongly influenced by Italian
Recovering from Failure with the GraFo Left Corner Parser
GraFo is a left corner parser for Italian, based on explicit rules manually coded in a unification formalism. As the linguistic coverage of GraFo is still quite limited, the parser produces complete parse trees for a small percentage of sentences. This paper presents a number of strategies to recover from GraFo parsing failures. The various techniques have been evaluated on the data provided by the EVALITA 2007 evaluation campaign
Extending WordNet with Syntagmatic Information
In this paper we present a proposal to extend WordNet-like lexical databases by adding information about the co-occurrence of word meanings in texts. More specifically we propose to add phrasets, i.e. sets of free combinations of words which are recurrently used to express a concept (let's call them Recurrent Free Phrases). Phrasets are a useful source of information for different NLP tasks, and particularly in a multilingual environment to manage lexical gaps. At least a part of recurrent free phrases can also be represented through a new set of syntagmantic (lexical and semantic) WordNet relations
- …